{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 百度API-表格文字识别(异步接口)\n",
    "[技术文档](https://cloud.baidu.com/doc/OCR/s/Ik3h7y238#%E8%8E%B7%E5%8F%96%E7%BB%93%E6%9E%9C%E6%8E%A5%E5%8F%A3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取APIkey、Secret"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 登陆百度AI开放平台\n",
    "[登陆控制台](https://console.bce.baidu.com/)->文字识别->创建应用->获取API_KEY与API_SECRET<br>\n",
    "![控制台](百度AI控制台.png)<br>\n",
    "![key](key.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取access_token\n",
    "[鉴权](https://ai.baidu.com/ai-doc/REFERENCE/Ck3dwjhhu)<br>\n",
    "![鉴权](鉴权.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'refresh_token': '25.ab99bf59b118fcbc7fe40b710bd603ba.315360000.1923136708.282335-23080480', 'expires_in': 2592000, 'session_key': '9mzdX+/GaJJyvLaMEEWf+CR5CfPguhuSWcnRfcnNb9sLT/KFFFxsMq0nLWAnhlvvu5thSX4+dWjiD6dDPwmi/uyQjpupLA==', 'access_token': '24.d5cfcbfe365011725ca5e293bb0eadc9.2592000.1610368708.282335-23080480', 'scope': 'public vis-ocr_ocr brain_ocr_scope brain_ocr_general brain_ocr_general_basic vis-ocr_business_license brain_ocr_webimage brain_all_scope brain_ocr_idcard brain_ocr_driving_license brain_ocr_vehicle_license vis-ocr_plate_number brain_solution brain_ocr_plate_number brain_ocr_accurate brain_ocr_accurate_basic brain_ocr_receipt brain_ocr_business_license brain_solution_iocr brain_qrcode brain_ocr_handwriting brain_ocr_passport brain_ocr_vat_invoice brain_numbers brain_ocr_business_card brain_ocr_train_ticket brain_ocr_taxi_receipt vis-ocr_household_register vis-ocr_vis-classify_birth_certificate vis-ocr_台湾通行证 vis-ocr_港澳通行证 vis-ocr_机动车购车发票识别 vis-ocr_机动车检验合格证识别 vis-ocr_车辆vin码识别 vis-ocr_定额发票识别 vis-ocr_保单识别 vis-ocr_机打发票识别 vis-ocr_行程单识别 brain_ocr_vin brain_ocr_quota_invoice brain_ocr_birth_certificate brain_ocr_household_register brain_ocr_HK_Macau_pass brain_ocr_taiwan_pass brain_ocr_vehicle_invoice brain_ocr_vehicle_certificate brain_ocr_air_ticket brain_ocr_invoice brain_ocr_insurance_doc brain_formula brain_ocr_meter brain_doc_analysis brain_ocr_webimage_loc wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base smartapp_mapp_dev_manage iop_autocar oauth_tp_app smartapp_smart_game_openapi oauth_sessionkey smartapp_swanid_verify smartapp_opensource_openapi smartapp_opensource_recapi fake_face_detect_开放Scope vis-ocr_虚拟人物助理 idl-video_虚拟人物助理 smartapp_component smartapp_search_plugin', 'session_secret': '5ac42db46a00db1a4deefeb0bb0385f5'}\n"
     ]
    }
   ],
   "source": [
    "import requests \n",
    "\n",
    "# client_id 为官网获取的AK， client_secret 为官网获取的SK\n",
    "host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=【官网获取的AK】&client_secret=【官网获取的SK】'\n",
    "response = requests.get(host)\n",
    "if response:\n",
    "    print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'24.d5cfcbfe365011725ca5e293bb0eadc9.2592000.1610368708.282335-23080480'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "access_token=response.json()['access_token']\n",
    "access_token"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 图片表格转Excel(表格文字识别(异步接口))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [请求](https://cloud.baidu.com/doc/OCR/s/Ik3h7y238#%E6%8F%90%E4%BA%A4%E8%AF%B7%E6%B1%82%E6%8E%A5%E5%8F%A3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# encoding:utf-8\n",
    "\n",
    "import requests\n",
    "import base64\n",
    "\n",
    "'''\n",
    "表格文字识别(异步接口)\n",
    "'''\n",
    "\n",
    "request_url = \"https://aip.baidubce.com/rest/2.0/solution/v1/form_ocr/request\"\n",
    "# 二进制方式打开图片文件\n",
    "f = open('[本地文件]', 'rb')\n",
    "img = base64.b64encode(f.read())\n",
    "\n",
    "params = {\"image\":img}\n",
    "access_token = '[调用鉴权接口获取的token]'\n",
    "request_url = request_url + \"?access_token=\" + access_token\n",
    "headers = {'content-type': 'application/x-www-form-urlencoded'}\n",
    "response = requests.post(request_url, data=params, headers=headers)\n",
    "if response:\n",
    "    print (response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'result': [{'request_id': '23080480_2311434'}], 'log_id': 1607777076850240}\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "import base64\n",
    "\n",
    "'''\n",
    "表格文字识别(同步接口)\n",
    "'''\n",
    "\n",
    "request_url = \"https://aip.baidubce.com/rest/2.0/solution/v1/form_ocr/request\"\n",
    "# 二进制方式打开图片文件\n",
    "f = open('dujiaoshou.jpeg', 'rb')\n",
    "img = base64.b64encode(f.read())\n",
    "\n",
    "params = {\"image\":img}\n",
    "access_token =access_token\n",
    "request_url = request_url + \"?access_token=\" + access_token\n",
    "headers = {'content-type': 'application/x-www-form-urlencoded'}\n",
    "response = requests.post(request_url, data=params, headers=headers)\n",
    "if response:\n",
    "    print (response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'23080480_2311434'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "request_id=response.json()[\"result\"][0][\"request_id\"]\n",
    "request_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [返回数据](https://cloud.baidu.com/doc/OCR/s/Ik3h7y238#%E8%8E%B7%E5%8F%96%E7%BB%93%E6%9E%9C%E6%8E%A5%E5%8F%A3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'result': {'result_data': 'http://bj.bcebos.com/v1/ai-edgecloud/F0F909A4FD5142CEA78651D93BD2B8F6.xls?authorization=bce-auth-v1%2Ff86a2044998643b5abc89b59158bad6d%2F2020-12-12T12%3A44%3A42Z%2F172800%2F%2F46bf592b2e1ccadf4947d2be74167f2586c4164406e0c3db781a0744bb329152',\n",
       "  'ret_msg': '已完成',\n",
       "  'request_id': '23080480_2311434',\n",
       "  'percent': 100,\n",
       "  'ret_code': 3},\n",
       " 'log_id': 1607777633717528}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#导入模块进行网络请求\n",
    "import requests\n",
    "param={\n",
    "    \"access_token\":access_token\n",
    "}\n",
    "headers = {\n",
    "    'content-type': 'application/x-www-form-urlencoded'\n",
    "}\n",
    "body={\n",
    "    \"request_id\":request_id,\n",
    "    #\"result_type\":'json'#可选Excel\n",
    "}\n",
    "result_url=\"https://aip.baidubce.com/rest/2.0/solution/v1/form_ocr/get_request_result\"\n",
    "r=requests.post(result_url,data=body,params=param,headers=headers)\n",
    "r.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'log_id': 1607777505810386,\n",
      " 'result': {'percent': 100,\n",
      "            'request_id': '23080480_2311434',\n",
      "            'result_data': '{\"form_num\":1,\"forms\":[{\"footer\":[{\"rect\":{\"top\":508,\"left\":0,\"width\":688,\"height\":512},\"column\":[1],\"row\":[1],\"word\":\"\"}],\"header\":[{\"rect\":{\"top\":0,\"left\":0,\"width\":688,\"height\":2},\"column\":[1],\"row\":[1],\"word\":\"\"}],\"body\":[{\"rect\":{\"top\":2,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[1],\"word\":\"序号\"},{\"rect\":{\"top\":48,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[2],\"word\":\"1\"},{\"rect\":{\"top\":94,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[3],\"word\":\"2\"},{\"rect\":{\"top\":140,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[4],\"word\":\"3\"},{\"rect\":{\"top\":186,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[5],\"word\":\"4\"},{\"rect\":{\"top\":232,\"left\":2,\"width\":80,\"height\":48},\"column\":[1],\"row\":[6],\"word\":\"5\"},{\"rect\":{\"top\":280,\"left\":2,\"width\":80,\"height\":44},\"column\":[1],\"row\":[7],\"word\":\"6\"},{\"rect\":{\"top\":324,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[8],\"word\":\"7\"},{\"rect\":{\"top\":370,\"left\":2,\"width\":80,\"height\":46},\"column\":[1],\"row\":[9],\"word\":\"8\"},{\"rect\":{\"top\":416,\"left\":2,\"width\":80,\"height\":48},\"column\":[1],\"row\":[10],\"word\":\"9\"},{\"rect\":{\"top\":464,\"left\":2,\"width\":80,\"height\":44},\"column\":[1],\"row\":[11],\"word\":\"10\"},{\"rect\":{\"top\":2,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[1],\"word\":\"公司\"},{\"rect\":{\"top\":48,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[2],\"word\":\" '\n",
      "                           'Waymo\"},{\"rect\":{\"top\":94,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[3],\"word\":\"蚂蚁金服\"},{\"rect\":{\"top\":140,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[4],\"word\":\"字节跳动(今日头条)\"},{\"rect\":{\"top\":186,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[5],\"word\":\"阿里云\"},{\"rect\":{\"top\":232,\"left\":82,\"width\":186,\"height\":48},\"column\":[2],\"row\":[6],\"word\":\"滴滴出行\"},{\"rect\":{\"top\":280,\"left\":82,\"width\":186,\"height\":44},\"column\":[2],\"row\":[7],\"word\":\" '\n",
      "                           'JUUL '\n",
      "                           'Labs\"},{\"rect\":{\"top\":324,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[8],\"word\":\"阿里本地生活\"},{\"rect\":{\"top\":370,\"left\":82,\"width\":186,\"height\":46},\"column\":[2],\"row\":[9],\"word\":\" '\n",
      "                           'Airbnb\"},{\"rect\":{\"top\":416,\"left\":82,\"width\":186,\"height\":48},\"column\":[2],\"row\":[10],\"word\":\" '\n",
      "                           'Stripe\"},{\"rect\":{\"top\":464,\"left\":82,\"width\":186,\"height\":44},\"column\":[2],\"row\":[11],\"word\":\"大疆无人机\"},{\"rect\":{\"top\":2,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[1],\"word\":\"估值(亿美元)\"},{\"rect\":{\"top\":48,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[2],\"word\":\"1750\"},{\"rect\":{\"top\":94,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[3],\"word\":\"1538.46\"},{\"rect\":{\"top\":140,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[4],\"word\":\"750\"},{\"rect\":{\"top\":186,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[5],\"word\":\"670\"},{\"rect\":{\"top\":232,\"left\":268,\"width\":178,\"height\":48},\"column\":[3],\"row\":[6],\"word\":\"560\"},{\"rect\":{\"top\":280,\"left\":268,\"width\":178,\"height\":44},\"column\":[3],\"row\":[7],\"word\":\"380\"},{\"rect\":{\"top\":324,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[8],\"word\":\"300\"},{\"rect\":{\"top\":370,\"left\":268,\"width\":178,\"height\":46},\"column\":[3],\"row\":[9],\"word\":\"293\"},{\"rect\":{\"top\":416,\"left\":268,\"width\":178,\"height\":48},\"column\":[3],\"row\":[10],\"word\":\"225\"},{\"rect\":{\"top\":464,\"left\":268,\"width\":178,\"height\":44},\"column\":[3],\"row\":[11],\"word\":\"220\"},{\"rect\":{\"top\":2,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[1],\"word\":\"国家\"},{\"rect\":{\"top\":48,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[2],\"word\":\"美国\"},{\"rect\":{\"top\":94,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[3],\"word\":\"中国\"},{\"rect\":{\"top\":140,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[4],\"word\":\"中国\"},{\"rect\":{\"top\":186,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[5],\"word\":\"中国\"},{\"rect\":{\"top\":232,\"left\":446,\"width\":100,\"height\":48},\"column\":[4],\"row\":[6],\"word\":\"中国\"},{\"rect\":{\"top\":280,\"left\":446,\"width\":100,\"height\":44},\"column\":[4],\"row\":[7],\"word\":\"美国\"},{\"rect\":{\"top\":324,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[8],\"word\":\"中国\"},{\"rect\":{\"top\":370,\"left\":446,\"width\":100,\"height\":46},\"column\":[4],\"row\":[9],\"word\":\"美国\"},{\"rect\":{\"top\":416,\"left\":446,\"width\":100,\"height\":48},\"column\":[4],\"row\":[10],\"word\":\"美国\"},{\"rect\":{\"top\":464,\"left\":446,\"width\":100,\"height\":44},\"column\":[4],\"row\":[11],\"word\":\"中国\"},{\"rect\":{\"top\":2,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[1],\"word\":\"领域\"},{\"rect\":{\"top\":48,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[2],\"word\":\"智能科技\"},{\"rect\":{\"top\":94,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[3],\"word\":\"金融科技\"},{\"rect\":{\"top\":140,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[4],\"word\":\"文旅传媒\"},{\"rect\":{\"top\":186,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[5],\"word\":\"企业服务\"},{\"rect\":{\"top\":232,\"left\":546,\"width\":138,\"height\":48},\"column\":[5],\"row\":[6],\"word\":\"汽车交通\"},{\"rect\":{\"top\":280,\"left\":546,\"width\":138,\"height\":44},\"column\":[5],\"row\":[7],\"word\":\"生活服务\"},{\"rect\":{\"top\":324,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[8],\"word\":\"生活服务\"},{\"rect\":{\"top\":370,\"left\":546,\"width\":138,\"height\":46},\"column\":[5],\"row\":[9],\"word\":\"生活服务\"},{\"rect\":{\"top\":416,\"left\":546,\"width\":138,\"height\":48},\"column\":[5],\"row\":[10],\"word\":\"金融科技\"},{\"rect\":{\"top\":464,\"left\":546,\"width\":138,\"height\":44},\"column\":[5],\"row\":[11],\"word\":\"科业\"}]}]}',\n",
      "            'ret_code': 3,\n",
      "            'ret_msg': '已完成'}}\n"
     ]
    }
   ],
   "source": [
    "#美化代码\n",
    "import pprint\n",
    "pprint.pprint(r.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
